The TextPro Tool Suite
نویسندگان
چکیده
We present TextPro, a suite of modular Natural Language Processing (NLP) tools for analysis of Italian and English texts. The suite has been designed so as to integrate and reuse state of the art NLP components developed by researchers at FBK. The current version of the tool suite provides functions ranging from tokenization to chunking and Named Entity Recognition (NER). The system‟s architecture is organized as a pipeline of processors wherein each stage accepts data from an initial input or from an output of a previous stage, executes a specific task, and sends the resulting data to the next stage, or to the output of the pipeline. TextPro performed the best on the task of Italian NER and Italian PoS Tagging at EVALITA 2007. When tested on a number of other standard English benchmarks, TextPro confirms that it performs as state of the art system. Distributions for Linux, Solaris and Windows are available, for both research and commercial purposes. A web-service version of the system is under development.
منابع مشابه
Named Entity Extraction from Speech: Approach and Results Using the Textpro System
This paper describes the application of the TextPro system to the task of recognition of named entities in speech. TextPro is a lightweight engine for interpreting cascaded finite-state transducers. Although originally intended for processing text, the experience of this evaluation demonstrates the system can easily be adapted to processing transcripts generated by a speech recognizer as well. ...
متن کاملVenPro: A Morphological Analyzer for Venetan
This document reports the process of extending MorphoPro for Venetan, a lesser-used language spoken in the Nort-Eastern part of Italy. MorphoPro is the morphological component of TextPro, a suite of tools oriented towards a number of NLP tasks. In order to extend this component to Venetan, we developed a declarative representation of the morphological knowledge necessary to analyze and synthesi...
متن کاملEntity Mention Detection using a Combination of Redundancy-Driven Classifiers
We present an experimental framework for Entity Mention Detection in which two different classifiers are combined to exploit Data Redundancy attained through the annotation of a large text corpus, as well as a number of Patterns extracted automatically from the same corpus. In order to recognize proper name, nominal, and pronominal mentions we not only exploit the information given by mentions ...
متن کاملAn analytical model based on simulation aiming to improve patient flow in a hospital surgical suite
Surgical suits allocate a large amount of expenses to hospitals; on the other hand, they constitute a huge part of hospital revenues. Patient flow optimization in a surgical suite by omitting or reducing bottlenecks which cause loss of time is one of the key solutions in minimizing the patients’ length of stay[1] (LOS) in the system, lowering the expenses, increasing efficiency, and also enhanc...
متن کاملTextPro-AL: An Active Learning Platform for Flexible and Efficient Production of Training Data for NLP Tasks
This paper presents TEXTPRO-AL (Active Learning for Text Processing), a platform where human annotators can efficiently work to produce high quality training data for new domains and new languages exploiting Active Learning methodologies. TEXTPRO-AL is a web-based application integrating four components: a machine learning based NLP pipeline, an annotation editor for task definition and text an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008